26 research outputs found
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
In this work, we extend the instruction-tuned Llama-2 model with end-to-end
general-purpose speech processing and reasoning abilities while maintaining the
wide range of LLM capabilities, without using any carefully curated paired
data. The proposed model can utilize audio prompts as a replacement for text
and sustain a conversation. Such a model also has extended cross-modal
capabilities such as being able to perform speech question answering, speech
translation, and audio summarization amongst many other closed and open-domain
tasks. This is unlike prior approaches in speech, in which LLMs are extended to
handle audio for a limited number of pre-designated tasks. Experiments show
that our end-to-end approach is on par with or outperforms a cascaded system
(speech recognizer + LLM) in terms of modeling the response to a prompt.
Furthermore, unlike a cascade, our approach shows the ability to interchange
text and audio modalities and utilize the prior context in a conversation to
provide better results
Gendered nationalism : the gender gap in support for the Scottish National Party
Recent major surveys of the Scottish electorate and of Scottish National Party (SNP) members have revealed a distinct gender gap in support for the party. Men are markedly more likely than women to vote for the SNP and they comprise more than two-thirds of its membership. In this article, we use data from those surveys to test various possible explanations for the disproportionately male support for the SNP. While popular accounts have focused on the gendered appeal of recent leaders and on the partyâs fluctuating efforts at achieving gender equality in its parliamentary representation, we find much stronger support for a different explanation. Women are less inclined to support and to join the SNP because they are markedly less supportive of its central objective of independence for Scotland. Since men and women barely differ in their reported national identities, the origins of this gender gap in support for independence presents a puzzle for further research
Metropolitan Briefing Book, 2007
The Institute of Portland Metropolitan Studies (IMS) was created to connect the resources of higher education to the needs of the six-county, bit-state Portland-Vancouver metropolitan area (Clackamas, Clark, Columbia, Multnomah, Washington, and Yamhill Counties). In this spirit, we offer our 2007 Metropolitan Briefing Book. Our theme is regional variety. Variety has been touted as the very spice of life (William Cowper) and as the mother of enjoyment (Vivan Grey). Our region enjoys a good deal of variety--in its landscapes, in its economy, and in its people, their cultures, and their attitudes. These differences are important to local vitality and beauty. But while we generally view this variety as positive, we also worry about equity. Although we promote regional thought and action, we must understand that each community experiences the problems facing us in a slightly different way and often with significantly different resources
Prompting Large Language Models with Speech Recognition Abilities
Large language models have proven themselves highly flexible, able to solve a
wide range of generative tasks, such as abstractive summarization and
open-ended question answering. In this paper we extend the capabilities of LLMs
by directly attaching a small audio encoder allowing it to perform speech
recognition. By directly prepending a sequence of audial embeddings to the text
token embeddings, the LLM can be converted to an automatic speech recognition
(ASR) system, and be used in the exact same manner as its textual counterpart.
Experiments on Multilingual LibriSpeech (MLS) show that incorporating a
conformer encoder into the open sourced LLaMA-7B allows it to outperform
monolingual baselines by 18% and perform multilingual speech recognition
despite LLaMA being trained overwhelmingly on English text. Furthermore, we
perform ablation studies to investigate whether the LLM can be completely
frozen during training to maintain its original capabilities, scaling up the
audio encoder, and increasing the audio encoder striding to generate fewer
embeddings. The results from these studies show that multilingual ASR is
possible even when the LLM is frozen or when strides of almost 1 second are
used in the audio encoder opening up the possibility for LLMs to operate on
long-form audio
A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Automatic Speech Recognition (ASR) models need to be optimized for specific
hardware before they can be deployed on devices. This can be done by tuning the
model's hyperparameters or exploring variations in its architecture.
Re-training and re-validating models after making these changes can be a
resource-intensive task. This paper presents TODM (Train Once Deploy Many), a
new approach to efficiently train many sizes of hardware-friendly on-device ASR
models with comparable GPU-hours to that of a single training job. TODM
leverages insights from prior work on Supernet, where Recurrent Neural Network
Transducer (RNN-T) models share weights within a Supernet. It reduces layer
sizes and widths of the Supernet to obtain subnetworks, making them smaller
models suitable for all hardware types. We introduce a novel combination of
three techniques to improve the outcomes of the TODM Supernet: adaptive
dropouts, an in-place Alpha-divergence knowledge distillation, and the use of
ScaledAdam optimizer. We validate our approach by comparing Supernet-trained
versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using
LibriSpeech. Results demonstrate that our TODM Supernet either matches or
surpasses the performance of manually tuned models by up to a relative of 3%
better in word error rate (WER), while efficiently keeping the cost of training
many models at a small constant.Comment: Meta AI; Submitted to ICASSP 202
The First Provenance Challenge
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions
A Meta-Regression Analysis to Evaluate the Effects of Narasin on Grow-Finish Pig Performance
A meta-regression analysis was conducted to evaluate the effects of added narasin in growing-finishing pig diets to predict the influence on average daily gain (ADG), feed efficiency (G:F), and carcass yield. A database was developed containing 21 technical reports, abstracts, and refereed papers from 2012 to 2021 representing 35 observations for growth performance data in studies ranging from 35 to 116 days in length (overall data). In addition, within these 35 observations, individual period data were evaluated (143 observations) using weekly, bi-weekly, or monthly performance intervals (period data). Regression model equations were developed, and predictor variables were assessed with a stepwise manual forward selection procedure. Important variables in predicting the response to added narasin included ADG, average daily feed intake (ADFI), and G:F of the control pigs, feeding duration (shorter or longer than 65 days) and body weight (greater than or less than 230 lb). Using median values from the database for predictor variables, the meta-analysis indicated narasin would be expected to improve ADG between 1.06 to 1.65%, G:F between 0.71 to 1.71%, and carcass yield by 0.31% when fed for longer than 65 days